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Nonconvexity of the relative entropy for Markov dynamics: 
A Fisher information approach 

Matteo PolettinQ and Massimiliano Esposito 

Complex Systems and Statistical Mechanics, University of Luxembourg, Campus Limpertsberg, 

162a avenue de la Faiencerie, L-1511 Luxembourg (G. D. Luxembourg) 

(Dated: April 24, 2013) 

We show via counterexamples that relative entropy between the solution of a Markovian master 
equation and the steady state is not a convex function of time. We thus let down a curtain on a 
possible formulation of a principle of thermodynamics regarding decrease of the nonadiabatic en- 
tropy production. However, we argue that a large separation of typical decay times is necessary for 
nonconvex solutions to occur, making concave transients extremely short-lived with respect to the 
main relaxation modes. We describe a general method based on the Fisher information matrix to 
discriminate between generators that do and don't admit nonconvex solutions. While initial condi- 
tions leading to concave transients are shown to be extremely fine-tuned, by our method we are able 
to select nonconvex initial conditions that are arbitrarily close to the steady state. Convexity does 
occur when the system is close to satisfy detailed balance, or more generally when certain normality 
conditions of the decay modes are satisfied. Our results circumscribe the range of validity of a 
conjecture proposed by Maes et al. [Phys. Rev. Lett. 107, 010601 (2011)] regarding monotonicity 
of the large deviation rate functional for the occupation probability (dynamical activity), showing 
that while the conjecture might still hold in the long time limit, the dynamical activity is not a 
Lyapunov function. 
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I. INTRODUCTION 



The quest for general variational principles of thermo- 
dynamics and for arrows of time far from equilibrium 
leads researchers to sieve the behavior of several ensemble 
and path observables, in order to establish the stability 
of steady states, describe fluctuations out of them, and 
characterize evolution towards them [iHZ]- In the con- 
text of the probabilistic formulation of thermodynamics 
in terms of Markov processes [HtilOj. blending aspects 
of information theory and thermodynamics, relative en- 
tropy with respect to the steady state naturally draws the 
inquirer's attention, having a threefold role: a dynamic 
one as a Lyapunov functional [S], a thermodynamic one 
as a nonadiabatic contribution to the entropy produc- 
tion [n] [12] , and a statistical one as a tool for parameter 
estimation [T31 [T3] . Along these lines, many may have 
conducted systematic research on the hypothesis that 
relative entropy is a convex function of time along the 
solution of a Markovian master equation, at least not 
too far from the steady state. Indeed, this is a tempting 
hypothesis, in that it would make for a new principle of 
thermodynamics, analogous to the principle of minimum 
entropy production [21 US] . 

In this paper we will work with continuous-time, 
discrete-state space stationary Markov processes, de- 
scribed by a master equation whose solution p(t) tends 
asymptotically to a unique steady state p* . Along this 
solution, relative entropy with respect to the steady state 
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is defined as 



H{t) 



Z^ 



Pi{t)hi{t), 



where we refer to 



/i»(i)=ln 



Mt) 



Pi 



(1) 



(2) 



as the relative self-information. 

Relative entropy is positive when p{t) ^ p* , and it 
decreases monotonically to zero; hence it is a proper 
Lyapunov function [5J [T^. For systems whose transi- 
tion rates satisfy the condition of detailed balance, af- 
fording an equilibrium steady state with no net circu- 
lation of currents, relative entropy is convex when the 
system is sufficiently close to the steady state, i.e. in 
the linear regime. For this class of systems there exists 
an energy function and an environment temperature T, 
such that dF = TdH is consistently identified as a free 
energy increment (setting Bolzmann's constant ks = !)■ 
In this case the entropy production is a state function 
dSi — ~dH, describing in many respects the system's 
thermodynamics. Monotonicity of the relative entropy 
corresponds to a positive entropy production rate Si > 0, 
i.e. to the second law of thermodynamics. The entropy 
production rate vanishes at equilibrium. Si = 0, where 
no irreversible fluxes occur. Convexity of the relative en- 
tropy in the linear regime yields the stability criterion 
Si < 0, which constitutes a version of the minimum en- 
tropy production rate principle [3]- Hence, the thermo- 
dynamics of systems relaxing to equilibrium states is fully 
encoded in the behavior of the relative entropy. 

For autonomous nonequilibrium systems, whose gen- 
erator does not depend explicitly on time, (minus) the 



time derivative of the relative entropy is not a fully sat- 
isfactory concept of entropy production rate, as one ex- 
pects that nonequilibrium steady states should display 
a non-null steady flux of entropy towards the environ- 
ment. It can still be interpreted as a nonadiabatic con- 
tribution dSna = —dH to the total entropy production 
dSi =dSa + dSnai Owing its name to the fact that, when 
the system is perturbed on time-scales that are longer 
than the spontaneous relaxation times of the system (adi- 
abatic limit), this contribution approximately vanishes 
[T^ . It has also been interpreted as a sort of nonequilib- 
rium free energy for systems subject to nonequilibrium 
forces such as chemical potential gradients, in an isother- 
mal environment [17]. Along the lines of research de- 
veloped by Schnakenberg |^, an adiabatic term dSa is 
added to the nonadiabatic one. It accounts for a flux of 
entropy from the system to the environment, due to de- 
parture from the condition of detailed balance. The adi- 
abatic/nonadiabatic splitting of Schnakenberg's entropy 
production is particularly useful to characterize the sec- 
ond law when non-autonomous (nonstationary) Marko- 
vian evolution is considered |18j . in which case one shall 
also account for a work term [T5] . 

In general, the total entropy production rate is not a 
state function, reflecting the well-known fact that irre- 
versibility along nonequilibrium processes is character- 
ized by inexact differentials that do not integrate to zero 
along closed paths, a notable example being Clausius's 
formulation of the second law <f 6Q/T > (see Ref. [501 
for a discussion in the context of Schnakenberg's theory) . 
For this reason we distinguished between the exact and 
the inexact differentials d a,ndd. Moreover, while positive 
in accordance with the second law of thermodynamics, 
the total entropy production rate presents no apparent 
regularity in its time evolution. In particular it does not 
approach its steady value monotonously. 

Then, convexity of the relative entropy, at least in 
the linear regime, is an intriguing hypothesis, in that 
it would make for a minimum principle of a nonequilib- 
rium state function, amending the unpredictable behav- 
ior of the entropy production rate. The principle would 
state that the nonadiabatic rate of entropy production 
decreases monotonously to zero, regardless of the con- 
comitant spontaneous arrangement of heat fluxes, matter 
fluxes, charge currents etc. 



A. Results and plan of the paper 



Is relative entropy convex? We answer in the negative. 



providing in Sec I B a counterexample for a simple three- 
state system with real spectrum of the generator. In 
Sec jII A| we show that convexity violation can occur with 
initial conditions picked arbitrarily close to the nonequi- 
librium steady state. By close we mean that the second 
order term in the expansion of the relative entropy cap- 
tures the full dynamical behavior. However, we argue in 



It requires a wide separation of time scales and a very 
fine tuning of the initial conditions. Moreover, an exten- 
sive numerical search did not allow us to find counterex- 
amples for three-state systems with complex spectrum, 
which might indicate that convexity is even more robust 
when some eigenmodes have an oscillatory character. 

Throughout the paper we employ a method based on 
the Fisher covariance matrix to characterize generators 
that violate convexity. Systems with real spectrum are 
described in SecjII A| We discuss the special case of equi- 



librium systems in Sec jII B[ For sake of completeness, 
the general theory for systems with complex spectrum 
is analyzed in Appendix [^ Convexity still holds near 
the steady state for a special class of "normal" systems, 
including detailed balanced systems, as Maes et al. dis- 
cussed [51 ; we recast this result in our formalism. 

A connection between the second time derivative of 
the relative entropy and the first time derivative of the 
dynamical activity near the steady state is established in 
Sec jIII A] This allows us to discuss the range of validity 
of a conjecture by Maes et al. regarding the monotonicity 
of the dynamical activity |5J |5] . 

If present at all, nonconvex transients of the relative 
entropy and nonmonotone transients of the dynamical ac- 
tivity prelude to a final regime dominated by the mode 
with slowest decay rate. This regime is trivially con- 
vex for systems with real spectrum, while in the complex 
case one encounters interesting complications, leading to 
conditions on the real and imaginary parts of complex 
conjugate eigenvalues. We briefly pursue this discussion 
in Sec lIITBl 

Finally, in Sec jIII C] wc briefly discuss the rationale be- 
hind the use of the Fisher information measure, before 
drawing conclusions. 



B. Counterexample 



Consider the continuous-time Markovian generator 



-401 1 1 
W^ \ 400 -2 1 
1 1 -2 



with steady state 



p* = (3,801,402)/1206. 



(3) 



(4) 



Notice that the system is strongly unbalanced, with one 
overwhelmingly large rate. As a consequence, one state 
is almost neglected, its occupancy probability falling 
rapidly to a value near zero. We choose as initial density 



p= (0.002,0.464,0.534). 



(5) 



Sec II C that convexity violation is rare and short-lived. 



We propagate p in time via p(t) — ex.p{tW)p, and eval- 
uate relative entropy with respect to the steady state. 
The plot of H{t) in Fig 111 (bolder line) clearly becomes 
negative for a short transient time. 




FIG. 1: Second time derivative of the relative entropy as a 
function of i**, with different initial conditions: (a) As in our 
counterexample, p'"'(0) = p; (b,c) Perturbed along the slow 
mode, withp<'''(0) =p + 0.1g^, andp''=)(0) =p-0.1q-; (d,e) 



Perturbed along the fast mode, 
andpt") =p- 0.005 g+. 



withp^'''(0) = p- 0.001 (7+ 



The above generator has a real spectrum, with eigen- 
value zero relative to the steady state and two negative 
eigenvalues A+ = —402 and A_ = —3 determining re- 
spectively fast and slow exponential decays. Hence, the 
system displays a large separation between typical decay 
times. The corresponding eigenvectors are; 



9+ « (-1,1,0), g_« (0,-1,1). 



(6) 



If we perturb the initial condition along the mode with 
the slower decay rate g_, even large perturbations do 
not suffice to restore convexity (see curves (b) and (c) 
in FigjTj). However, if we perturb the initial condition 
along the mode with the faster decay rate q^ , even slight 
perturbations do (Figfll curves (d) and (e)). Thus a very 
precise fine-tuning on the initial conditions must be at- 
tained to generate a counterexample. Moreover, since 
the dynamics damps the fastest mode first, the concave 
regime is extremely short-lived, with a typical survival 
time of order r ^ ^A^^. Finally, the large separation of 
time scales implies that mapping the initial state back in 
time with exp(— tM^) leads very soon to to nonphysical so- 
lutions (negative probabilities), as the fastest eigenmode 
would now dominate. Reversing the argument, "typical" 
dynamics will not pass by state p, which has to be specif- 
ically selected. 



II. THEORY AND RESULTS: 
SPECTRUM 



REAL 



A. Conditions for convexity violations 

We describe in this section a general algebraic proce- 
dure that allows to discriminate between generators that 
do or don't admit initial conditions violating convexity. 

Let us consider a Markovian continuous-time evolution 
on n states, with irreducible rates Wij for jumps from 



state j to i admitting a unique steady state p*. It is 
known that the n— 1 non-null eigenvalues Xa of W have 
a negative real part with units of an inverse time — l/r^, 
characterizing relaxation. We consider in this section 
generators with real non-degenerate spectrum, which af- 
ford a complete set of independent eigenvectors. We will 
discuss defective generators, affording a nondiagonal Jor- 
dan normal form, in Sec |IID[ The eigenvalue equations 
read 



Wq'' = Aaq", Wp* = 0. 



(7) 



Diagonalizing the propagator U{t) = cx-p{tW), we obtain 
for the time-evolved distribution 



Pit) 



P 



E 



e^"*c,q° 



(8) 



where c = (ci, . . . ,c„_i) is a real vector, specifying the 
initial state of the system. Let Ca{t) = CaexpAat. We 
expand the relative entropy to second order, obtaining 



^W«E5"'ca(0c6(0, 



a.h 



where 



2^ p* 2^^ '^ ' 



(9) 



(10) 



The right-hand side defines a scalar product ( • , • ) . Prop- 
erties of the matrix g""^, expecially regarding equilibrium 
systems, are well known I65 Sec. 5.7.]. Here it will be 
called Fisher matrix for reasons that are rooted in esti- 



mation theory, and that will be explained in Sec HI C 
It is a Gramian matrix, i.e. its entries are obtained as 
scalar products among vectors. When vectors q" are in- 
dependent, as under our assumptions, Gramian matrices 
are positive definite |22j . The Fisher matrix can then 
be seen as a realization in local coordinates of a metric 
on the space of statistical states; some applications to 
nonequilibrium decay modes have been discussed by one 
of the authors in Ref . f2T] . 

We introduce the negative-definite diagonal matrix of 
eigenvalues 



A = diag {Ai,...,A„_i}. 



(11) 



Let c{t) 



„tK 



c. Taking twice the time derivative of the 



relative entropy, to second order, we obtain 



H{t) = c{tf (2AGA + A^G + GA^) c{t) 



(12) 



The overbrace is used to define a bilinear symmetric form 
K , whose first contribution 2AGA is positive definite. 
However, K itself might admit at least one eigenvector k 
relative to a negative eigenvalue. When this is the case, 
the choice of initial conditions c (x k yields an initially 
negative second time-derivative of the relative entropy. 



Moreover, since the length of c can be made small at will, 
we can select initial states that are arbitrarily close to the 
steady state, still displaying violation of convexity, and 
fulfilling the second-order approximation to any degree of 
accuracy. Since K is known to be positive for particular 
systems, by continuity a negative eigenvalue of K can 
only occur if there exist generators such that 



dot K = 0. 



(13) 



Notice that K is built out of the eigenvalues and eigen- 
vectors of the generator, which are expressed in terms of 
transition rates. Hence Eq.( 13 ) identifies an algebraic set 



within the set of allowed rates. 

To recapitulate, the search for nonconvex generators 
is reduced to an algebraic polynomial equation, whose 
difficulty can be tuned at will by suitably parametriz- 
ing transition rates. Once a generator with at least 
one negative eigenvalue of K is found, one can solve 
the eigenvalue/eigenvector problem and find initial con- 
ditions that violate convexity. We report that this pro- 
cedure greatly reduced the computational complexity of 
the problem: Rather than randomly searching for a gen- 
erator and an initial state, we only looked for a suitable 
generator by a simple algebraic procedure; nonconvex ini- 
tial conditions follow. 



B. Time reversal close to equilibrium 

In this paragraph we show that the relative entropy of 
close-to equilibrium systems obeys convexity, and how 
properties of the Fisher matrix are related to time- 
reversal symmetry. This analysis will be extended to 
complex spectrum case in Appendix \K\ 

Given that dissipative dynamics have a preferred time 
direction, the time-reversal generator [231 P-47] is what 
comes closest to reverting the direction of time of a 
Markov process, by inverting certain nonequilibrium 
characteristics (e.g. steady currents) while preserving 
others (e.g waiting times). It has been considered by 
various authors in relation to fluctuation theorems PH - 
[^ , to prove convexity for normal systems [S] , to discuss 
spectral properties of Markov processes [27] , and to iden- 
tify a supersymmetry in Markovian dynamics |28j. 

The definition of time-reversal is more intuitive in the 
adjoint picture. Consider the steady state correlation of 
two functions J2iPifi9i ^t time t = 0, and displace / 
with the adjoint generator for a small time, f{dt) = (1 + 
W^dt)f. We ask which generator should be employed 

T 

to evolve g{dt) — {1 + W dt)g in such a way that the 
following time-reversal identity holds 



The latter equation defines the adjoint of the time- 

— T 
reversed generator W . Introducing the diagonal matrix 



an explicit calculation shows that the time-reversal gen- 
erator is given by 



W ^ PW' p- 



(15) 



Some of its properties are: The transformation is invo- 
lutive; The reversed dynamics affords the same steady 
state; Steady currents and affinities change sign; Exit 
probabilities out of states are unchanged; The spectra of 
W and W coincide. 

Equilibrium generators are those for which the time- 
reversed generator coincides with the original generator, 
W = W. In practice, this translates into the condition of 
detailed balance Wijp* = Wjip*. We can further charac- 
terize equilibrium generators in terms of the Fisher ma- 
trix as follows. We define 



V == P"i/2vt^pi/2^ 



(16) 



Equation ( 16 1 is a similarity transformation, hence the 



spectra of W and V coincide, and eigenvectors are 
mapped into eigenvectors. Performing an analogous 
transformation on the reversed generator we obtain 



V' 



p-^/^wp^/^ ^ p^/^w'^p-y^ 



(17) 



Hence the condition of detailed balance translates into V 
being symmetric. By the spectral theorem it follows that 
its spectrum is real (equilibrium systems do not admit 
complex eigenvalues), and it affords a complete set of 
orthonormal eigenvectors v"". Letting v'^ — ^/p* be the 
null eigenvector of V, all other eigenvectors v"" can be 
normalized so to have 



2g 



ah 



E 



a b ra6 



(18) 



On the left-hand side one can recognize the Fisher ma- 
trix, by transforming back to the eigenvectors of W ^ given 
by q°- = P"^/^?;". This transformation maps the eu- 
clidean scalar product in the above equation into the 
scalar product ( • , • ) . Given that we followed a chain of 
necessary and sufficient facts, it is then proven that the 
Fisher matrix G is diagonal if and only if the generator 
W satisfies detailed balance. In this case we have 



K = 4A^ 



(19) 



which is obviously positive definite. Hence convexity 
holds for equilibrium systems, in the linear regime. We 
don't know whether a violation of convexity could occur 
out of the linear regime, where higher-order contribu- 
tions from the logarithm in the expression for the rela- 
tive entropy might come into play. By continuity, nearly 
equilibrium systems also satisfy convexity. 



C. Time-scale separation 



P = diag{pi,...,p;_i}, 



(14) 



The counterexample provided in Sec jIBj is character- 
ized by a wide separation of typical decay times. It is 



an interesting question whether time-scale separation is 
necessary for violating convexity. It certainly is not suf- 
ficient, as it is well known that the spectrum alone does 
not characterize the nonequilibrium character [27] . For 
example the generator 



-202 201 1 
W=\ 201 -202 1 
1 1 -2 



(20) 



has decay times 1/3 and 1/403, but it satisfies detailed 
balance, hence it does not violate convexity. 

In the rest of this section we argue that a large time- 
scale separation might be necessary. We first hint at a 
general argument in favor of this conjecture, and then 
discuss the complications arising with nearly defective 
generators in the next section. 

Two consequences of time-scale separation are that 
concavity is extremely short-lived and that it has a short 
past. In fact, selecting initial conditions c = k along 
a negative eigenvector of K, and perturbing them for 
a short time r, the time evolved coefficients c{t) « 
(1 -I- AT)k skew k along the fastest mode, with typi- 
cal time for restoring convexity given by the smaller de- 
cay time, T = — sup^A"^. For the same reason, map- 
ping back in time with eyip{—tW) leads soon to negative, 
nonphysical probabilities. Hence, nonconcave states are 
hardly encountered by "typical dynamics" . 

Let us suppose that decay rates are not widely sepa- 
rated, i.e. that there exists some average value A within 
the spectrum such that 



A, -A 

are all small. Defining the matrix 

e = diag{ei,...,e„_i}. 



(21) 



(22) 



such that A = A(/ — e), with / the (n — l)-dimensional 
unit matrix, and evaluating 



AGA = X'^{G + eG + Ge 
A^G = X^{G + 2eG 
GA^ = A^ (G + 2G£ + Ge^ 



eGe) 



e^G) 



we find that 



K = 4AGA + X'^[e,[e,G]]. 



(23a) 
(23b) 
(23c) 

(24) 



is the commutator. Equation (24) states that 



Here 

corrections to the positive definite contribution 4AGA are 
second-order in the eigenvalue spacings. If w'^AGAw is 
finite for all vectors w of finite norm, there are no possible 
initial conditions that will lead to a violation of convexity. 



D. Time-scale separation: Defective generators 

However, it can be the case that when eigenvalues are 
made closer to each other, eigenvectors of the generator 




FIG. 2; 20 000 randomly generated values of a, 13. The shaded 
region corresponds to a^ — P^ < 1/2. 



also tend to overlap, in a limit where W becomes defec- 
tive, as it lacks a complete set of eigenvectors relative to 
degenerate eigenvalues. As one of the authors analyzed 
in Ref.f^, when W is nearly defective, G is nearly de- 
generate and it affords a nearly null eigenvector wq. For 
example for three-state systems, one would have 



G 



1 



1 



l + 0(e2) 



Die') 
1 



(25) 



and Wq = (1,-1). Then choosing w — A' "wq 
w^AGAw small of order e^ as well. Notice that, among 
matrices with degenerate spectrum, matrices that afford 
a basis of eigenvectors are a set of zero measure with 
respect to defective matrices 41' , so the latter are quite 
crucial for our argumentation. 

In this section, for three-state systems, we bring com- 
putational evidence that slightly departing from a defec- 
tive generator, convexity still holds for any initial condi- 
tions, so that one might conclude that nonconvexity and 
time scale separation go by hand. A general proof seems 
to be elusive. 

Consider a generic three-state system with real spec- 
trum, with two real eigenvalues —X± relative to eigen- 
vectors q±. Matrix K is given by 



K 



4Xl{q+,q+) (A+ + A_)2(g+,g_) 

(A+ + A_)2(g+,g_) 4A2_(9_,g_) 



and the determinant condition for convexity reads 
4A+A_ 



(26) 



(A+ + A_)2 



> J?+l4^=eos^, (27) 



194 



k-ll 



where the norm is calculated with respect to the scalar 
product (•,•). We recognize in the right-hand side the co- 
sine of the angle ip between the two vectors, which van- 
ishes when eigenmodes are orthogonal, i.e. for equilib- 
rium systems. At the opposite extremum, cos (p reaches 
value 1 when eigenmodes are coUinear; this only occurs 
when the system is defective, as it lacks a set of indepen- 
dent eigenvectors, in which case the two eigenvalues are 
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FIG. 3: In solid lines, plotted as functions of i*, the first 
derivative of (a) the dynamical activity as a function of /i(t), 
and (c) its second-order approximation. In dashed lines, (d) 
one-half minus the second time derivative the relative entropy 
as a function of p(i), and (b) its second order approximation. 



identical, and also the left-hand side attains value 1. In 
the vicinity of a defective generator, with slightly-spaced 
eigenvalues and eigenvectors 



A± = A(l±e), q± = x±ey, 



(28) 



we have 4A+A_/(A+ + A_)2 
2e2 



and 



cos if = 1 — 



\x\\^ 



xf\\y\\'-{x,yf). (29) 



Introducing the two parameters a — \\y\ 



c, j/)/||x|p, disequality (27l becomes 



and /3 



/32 > 1/2. 



(30) 



In Appendix IB] we give further details on how to express 
X and y (hence a and /3) in terms of the transition rates of 
a nearly defective generator. In Figj2]we plotted 20 000 
randomly generated values oi a,/3 for defective genera- 
tors, finding that none of them violates disequality (30). 
This suggests that, at least for three-state systems, a 
large separation of time scales is necessary. 



III. ADDITIONAL TOPICS 

A. Nonmonotonicity of the dynamical activity 

In the light of the above results, in this section we will 
present a discussion of the work by Maes, Netocny and 
Wynants [51E], to which we refer for the details. 

Consider a stochastic trajectory w up to time t, and 
let /i^ (i) be the fraction of time spent on site i along the 
trajectory. The question is, how typical is a set of values 
/i"(t) = /i(t)? The answer is found in the framework of 
large deviation theory in terms of the Donsker-Varadhan 
rate functional or dynamical activity Z?[/i(t)], which de- 
scribes fluctuations out of the most probable distribution 



p* , for which it vanishes. Here, time plays the role of 
the extensive parameter; as i — ?► oo, the steady state is 
exponentially favored. Instead, if N trajectories are in- 
dependently sampled at fixed time t, the rate functional 
for the probability of being at a site is given by rela- 
tive entropy, with N the extensive parameter. Therefore, 
relative entropy is a static rate functional while the dy- 
namical activity is a dynamic one. The latter captures 
the Markovian nature of the process, while the former 
regards independent realizations. 

All this given, the static nature of relative entropy as a 
large deviation functional does not prevent it from being 
monotonically decreasing, when evaluated along the solu- 
tion of a Markov process [8] . Does the Donsker-Varadhan 
functional monotonically decrease as well? Maes et al. 
showed that it doesn't when the initial state is picked 
far from the steady state, but that monotonic behavior 
is restored in the long time limit. They discussed a few 
examples supporting their case, and proved that normal 
systems (see Appendix |A| display a longstanding mono- 
tonic behavior. The remaining question is whether mono- 
tonicity might occur with initial states picked along any 
superposition of modes, arbitrarily close to the steady 
state. Leading back to our previous counterexample, we 
will show that this is not the case. More specifically, 
our counterexample shows that one of the hypothesis for 
Theorem III.l in Ref. 5j is not generally satisfied. 

Let /x(0) be the initial state of the system, /i(t) ~ 
exp(W^i)^(0) be its time-evolved, and let us consider a 
time-dependent transformation of the transition rates 



„"(*) 



^[ui(t)-u^{t)]/2 



(31) 



such that the generator M^"'-*) simulates steadiness at a 
frozen time t, that is, it affords /i(i) as its steady state. 



W^«(*);,(t) = 0. 



(32) 



It can be proven that there exists a unique choice of u(i), 
up to a ground potential, such that the above equation 
holds. Maes et al. proved that the Donsker-Varadhan 
functional is given by 



DUr)] ^Y.[^ 



Mt) 



Hj{t). 



(33) 



«j 



It affords a simple interpretation as the difference be- 
tween the average escape rate of the actual dynamics 
and that of the time-frozen steady dynamics at time t. 
According to Eq.(lO) in Ref. Jj^, when the state of the 
system is sufficiently close to the steady state, one has 



1 



D[lA = -7, (PW^u,W'^u) + (Pu,W 



(34) 



where we remind that P is the matrix having the steady 
probability entries along its diagonal. Letting u — P~^p, 
after some manipulations we obtain 



w 



1 



{{W + W)p,Wp). 



(35) 



Similarly, we can express the second time derivative of 
the relative entropy to second order as 



H[p] = {{W + W)p,Wp) 



(36) 



We notice an analogy by interchanging the generator and 
its time reversed. Therefore, to second order 



dt 



D[ey.^{tW)^i(Q)] 



'2dt^ 



H[exp{tW)p{0)] . (37) 



The respective initial conditions are connected by 

w^"p(°V(o) = 0. 



(38) 



In FigjS] we represent violation of monotonicity of the 
dynamical activity, using as time-reversed generator the 
one already employed in Sec I B namely Eq.®. The ini- 
tial state was chosen to have a relative entropy 2.5% (in 
base 3 [13 )• K is possible to reduce this measure of dis- 
tance to smaller and smaller values, making all curves in 
the picture closer and closer. Notice that the correspon- 
dence between the dynamical activity and the relative 
entropy (to second order) only holds at the initial time. 
Later, the two differ for two reasons: Their time behav- 
ior is due to different dynamics; they are not the same 
functional of the probability distribution. 

We point out that Lyapunov's second theorem for sta- 
bility assumes that there exists a function whose first 
derivative is negative in some neighborhood of a can- 
didate fixed point. Using our procedure, we were able 
to show that the dynamical activity and the first time 
derivative of the relative entropy do not satisfy this req- 
uisite. We note however that this fact is quite irrelevant, 
since the stability of steady states for irreducible Markov 
processes is well-established. 



B. Long-time behavior 

Since our results show that nonconvex transients of the 
relative entropy and nonmonotone transients of the dy- 
namical activity can occur arbitrarily close to the steady 
state, a natural question is whether a convex/monotone 
behavior is always restored in the long time limit when 
the dynamics is dominated by the mode with slowest de- 
cay rate. While this is a trivial fact for systems with 
real spectrum, for systems with complex spectrum it only 
holds when certain algebraic relations between the real 
and the imaginary parts of the eigenvalues are satisfied. 

In the real spectrum case, let Ai be the largest eigen- 
value that affords a nonnuU coefficient ci in the expansion 
of the initial state. We assume Ai to be nondegenerate. 
Then at large times p{t) ^ p* + ci e^'^^q^, and 



Hit) 



2 2tAi 

5iiCie \ 



(39) 



which is obviously convex. 

The case of systems with complex spectrum is dis- 
cussed at length in Appendix [X] In general, the rela- 
tive entropy can be written as a quadratic form in terms 



of a Fisher matrix that contains the information about 
the superposition of the real and imaginary parts of the 
complex eigenmodes. Let us only report that, by let- 
ting Ai = — T]~ -|- iuji and A^ be the complex conjugate 
eigenvalues with the largest real part affording nonnull 
coefficients ci,c^ in the expression for the initial state, 
p — p* + ciq^ + cj*g^*, convexity in the long time limit 
implies the following relationship between real and imag- 
inary parts of the eigenvalues: 



1 



1 + (^lTl)2 



> 1- 



4detGi 
(trGi)' 



(40) 



Gi is the 2x2 matrix having as entries the superpositions 
between real and complex parts of the relevant eigenvec- 
tor, i.e. g"^^^^ = (3^9^, 9(7^) /2, and so on. Letting g+,g- 
be the positive eigenvalues of Gi with 5+ > g~, the above 
condition translates into 



(riwi)' < 



2.9- 



.9- 



(41) 



In particular, the period of oscillation is (variably) 
bounded from below by its corresponding relaxation 
time. If convexity is to hold in the long time limit, os- 
cillations cannot be too fast with respect to their typical 
exponential decay time. The upper frequency bound de- 
pends on how the real and complex part of the decay 
mode overlap. 



C. On the information geometry of relative 
entropy and eigenmode estimation 

Before coining to conclusions, in this section we briefly 
linger on the interpretation of the Fisher matrix. 

Let i be a random variable taking values i with prob- 
ability distribution Pi{'d) conditional on an unknown 
parameter "d whose value one might want to estimate. 
Fisher's information is defined as [13] 



=<*)^ (^)' 



(42) 



pi-d) 



It measures how much information the random variable 
retains about the parameter. The derivative with re- 
spect to -d grants that G('i?) detects the sensibility of the 
probability to a parameter variation. For example, if the 
probability distribution does not depend on the param- 
eter at all, it vanishes. An important result concerning 
Fisher's information is that it sets a bound to the accu- 
racy of an estimation of the parameter d expressed by 
the Cramer-Rao inequality G(??)Var(??) > 1, where ?? is a 
so-called unbiased estimator of parameter i). For exam- 
ple, in the case where the probability does not depend 
on d, the variance of the estimator is infinite. This sort 
of indeterminacy relations have been put in contact with 
quantum [29 and statistical [3D] uncertainty; see Ref.[5T| 
for an application to temperature estimation. 



It has also long been known [T3] that Fisher's informa- 
tion is twice the relative entropy (Kullback-Liebler diver- 
gence) H{ • II • } between two nearby probability distribu- 
tions, to second order: 



H{p{^ + d^)\\p{d)} 



-g{i^)M\ 



(43) 



While this point of view slightly hinders the statistical 
relevance for parameter estimation, it provides a clear 
geometrical picture since the relative entropy locally de- 
fines a metric on submanifolds of the space of statisti- 
cal states. This metric is called the Fisher-Rao metric 
[32j . Generalizing to several estimation parameters, we 
can express the Fisher-Rao metric in local coordinates 

l9 = {'da)a as Ha 

H {p['d + (I'd) \\p{'d)] « ]^g''\-d)ddadA- (44) 

The Fisher matrix then arises as one possible represen- 
tation of the metric in a set preferred coordinates, which 
are dictated by the problem at hand. For example, in ap- 
plications to equilibrium statistical mechanics the Fisher 
matrix takes the form of a covariance matrix, coordinates 
being the intensive variables conjugate to the physical ex- 
tensive observables in the equilibrium measure (temper- 
ature, pressure, chemical potential, interaction constants 
etc.) [33]. Using the square roots of the entries of the 
probability density ^i = ^Jpl as coordinates, one obtains 
the real part of the Fubini-Study metric for quantum 
states |34j . Studies on the quantum Fisher information 
have also been proposed [351 I3S]- Beyond equilibrium, 
recent works [37] focus on how geodesic transport can 
represent classes of nonequilibrium transformations. Far 
from equilibrium, the Fisher information has been em- 
ployed to characterize the arrow of time [7] . 

In our work, the Fisher matrix is obtained by 
parametrizing the probability distribution in the vicin- 
ity of the steady state with a vector of variables 'd = 
c(i), at fixed time. Expressing the probability incre- 
ment along a small displacement from the steady state 
as dp = d°-p{'0)d'da, we can interpret the decay modes 
q"- = d"'p as tangent vectors at p* [2l]. In this guise, 
the Fisher matrix tells how the probability density de- 
pends on eigenmodes, and in particular how eigenmodes 
are correlated. More specifically, perturbing the steady 
state in the a-th direction, and defining the relative self- 
information carried by the a-th mode as 



K = log 



Pi 



^qtM 



we can interpret the Fisher matrix gab 



(45) 



oj,6\ 



'■{h^h 



as the correlation matrix between single-mode self- 
informations, which are uncorrelated for normal systems, 
including equilibrium systems. 

Within our framework, one can interpret the Cramer- 
Rao inequality as a limit to the precision with which one 
can estimate the weight of a mode at some time, hence. 



more interestingly, as a limit on our ability to trace back 
the initial conditions. In this respect, eigenvalues play 
a role analogous to Lyapunov's exponents in dynamical 
systems. Suppose we want to estimate the permanence 
of eigenmodes in a state at some given time (considering 
that, more often, one will want to estimate the value of 
an observable associated with eigenmodes) . Unbiased es- 
timators of the coefficients Ca{t) can be intuitively built 
as follows. Suppose at time t we sample N independent 
realizations of stochastic jump process whose probabil- 
ity distribution is described by a master equation, ob- 
taining data xi, . . . , X]\j. Since each datum was sampled 
with probability p(t), the probability of the sample is 

Pxi (t) ■ . -PxM (t)- Let /, = N^^ Z]„=i ^i.x„ be the empir- 
ical distribution of such samples, that is, an histogram. 
Consistently, if we average the empirical distribution over 
possible samples we obtain the original distribution (we 
drop the time-dependence hereafter): 



(/.; 



E 



Px 



■Pxn Ji 



Pi 



(46) 



We project the empirical distribution onto the left eigen- 
vectors q^^"^ of W such that (g^'",?'') = (5°''. The em- 
pirical coefficients c^ = ^^ q^ '"" fi are unbiased estima- 
tors, since one can easily show that (cq) = Ca- Let 
Cab — {{ca — Ca){cb — Cb)) bc their covariance matrix. 
The multivariate Cramer-Rao bound then states that 
CG > N~^I, where matrix inequality A > B means that 
A— B is positive semidefinite. Notice that the bound be- 
comes less strict as the number of samples is increased. 

For equilibrium systems, using an orthonormal set of 
modes with respect to the scalar product ( • , • ) , we have 
Caa > N^^, which is a statement about the variance 
of individual estimators; the information stored in the 
estimators "decouples". However, by a straightforward 
calculation one can show that the simple estimators that 
we built are not uncorrelated, even for equilibrium sys- 
tems. Building uncorrelated maximum-likelihood esti- 
mators, using orthogonality of the Fisher parameters, is 
an important task in estimation theory |38) . In this re- 
spect the theory states that equilibrium systems are more 
tractable for parameter estimation. 

Another case of interest is that of nearly defective 
systems. Using the nearly degenerate Fisher matrix in 



Eq.(25), and evaluating WqCGwq along vector wq 
(1, -^, one obtains 



2C12 



Cll - C22 > 



e2iV' 



(47) 



which as e — ^ implies that the covariance matrix is 
singular, and that the cross-correlation must diverge, as 
one will not be able to distinguish among the two modes. 
This kind of behavior has been put in contact with clas- 
sical and quantum phase transitions [39l |40] . It is inter- 
esting to note that in the context of Markovian dynam- 
ics, this critical behavior is accompanied with polynomial 
terms in the time evolution 1211. 



IV. CONCLUSIONS AND PERSPECTIVES 



Appendix A: Theory and results: Complex spectrum 



In this paper we showed how algebraic properties of the 
Fisher matrix, in a basis of decay modes, can be useful 
to tackle specific issues regarding nonequilibrium Markov 
processes, such as the monotonicity and the convexity of 
(candidate) Lyapunov functions. 

In particular, we were able to produce counterexam- 
ples to the convexity of relative entropy with respect to 
the steady state in the "nonequilibrium linear regime" , 
i.e. with initial conditions picked arbitrarily close to a 
nonequilibrium steady state. From a thermodynamic 
perspective, this tells us that there is no general princi- 
ple of minimum nonadiabatic entropy production, which 
would represent the nonequilibrium analogue of a well- 
known stability criterion for close-to-equilibrium systems 
[3]. However, our counterexamples display a very subtle 
fine-tuning of the initial conditions, and for three-state 
systems we argued that a large separation of time scales 
has to be attained. If both these facts could be rigorously 
proven and extended to more general systems, since the 
nonconvex regime has the typical lifetime of the short- 
est decay time, such eventual transients would be proven 
to be completely irrelevant with respect to the dominant 
dynamics, and one would be able to argue that "for all 
practical purposes" such generalized principles do hold. 

Our discussion strongly relied on Fisher's information, 
a concept from estimation theory that has long been 
employed in equilibrium statistical mechanics, and is 
now being more and more explored with applications to 
nonequilibrium systems. Our use of this interesting tool 
was quite restrained, but we envisage that the techniques 
hereby introduced could be useful to discuss stability and 
fluctuations of nonequilibrium systems and possible rela- 
tionships regarding decay modes and eigenvalues. For 
example, much work on the interplay between the imag- 
inary and the real parts of complex eigenvalues has yet 
to be addressed. As briefly described in the last sec- 
tion, other interesting applications of Fisher's informa- 
tion might come from exploiting the full machinery of 
information geometry and estimation theory, in particu- 
lar as regards the actual definition of unbiased estimators 
and the implications of the Cramer-Rao bound. Finally, 
normal systems might deserve further attention, as they 
are, in a way, the nonequilibrium analogue of detailed 
balance systems. 
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In this section we discuss the construction of the Fisher 
matrix for systems with complex spectrum, describing 
conditions for convexity violation, and introducing a class 
of nonequilibrium generators (called "normal" ) for which 
convexity holds. They are peculiar with respect to time 
reversal, in a way that might be considered a nonequilib- 
rium generalization of detailed balance. 

We introduce the case of complex spectrum with the 
following chain of considerations. Let a three-state sys- 
tem have two real decay modes. The Fisher matrix reads 



G = 



{q+,q-) {q-,q-) 



(Al) 



and the square root of its determinant is the area of the 
parallelogram formed by the two vectors. The area of 
a parallelogram coincides with the area of the parallelo- 
gram formed by its diagonals, {q+ + q-)/2 and g_ — q+. 
We rescale the diagonals of factors V2 and l/-\/2 respec- 
tively, while keeping the area invariant, and define vectors 
Qi = {q+ + (7_)/\/2, P2 = {q+ — g_)/-\/2 and the tilted 
Fisher matrix 



G 



(91,91) {qi,P2) 

(91,^2) {P2,P2) 



(A2) 



We have det G — dot G. Notice that G is obtained from 
G after a rotation of an angle tt/2 of the defining vectors. 
In general, when one performs a change of basis in the 
space of modes, q°' — > J^a ^al'^j ^ transforms by matrix 
congruence G — )■ A'^GA, which is a similarity of matri- 
ces only when A — A^'^ is orthogonal. For equilibrium 
systems G (x I; for defective systems G is degenerate. 
Both these properties remain true for all representatives 
under the orthogonal transformation A; such properties 
are equivalently described by G or G. 

When the generator admits a couple of complex- 
conjugate eigenmodes 



9± 



iqi±iq2)/V2 



(A3) 



relative to complex-conjugate eigenvalues 

A± = - l/T±iU!, 



the matrix defined in Equation ( Al ) has complex entries 



It is meaningful to perform a rotation in the complex 
plane. Switching to the tilted matrix, with 52 = ip2, we 
have 



G 



1 

i 



{qi,qi) 
{qi,q2) 



{qi,q2) 
(92, 92) 



1 

i 



(A4) 



Notice that det G < 0. A candidate as a Fisher matrix for 
systems with complex spectrum is then given by G = |G|. 
Generalizing, when the system has both real eigenvalues 
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labelled by a and complex eigenvalues labelled by k, we 
define the Fisher matrix 



G = 







(5'2^9''' 



k^k' ,a,a' 



Now consider the state 
Pit) = P*+^c,(i)g" 



E 

k 



'ck{t)q^ + 4{t)q 



k *1 



(A5) 
(cfc + cl)l\/2 and c^ = (cfc - ^DJi^ and 



Define cj, 

collect the data in a vector 

C = [CaTCf.^ C^)a.k- 



(A6) 



It is a simple exercise to prove that the relative entropy in 
the linear regime reads H = cF Gc. The time-derivative 
of c is also easily calculated, 



where we introduced the matrix 



(A7) 



iVL 



-^k T, 



-1 
k 



V 



(A8) 



/ 



and T = diag {tj, ,Tj, ,T^^}k.a- Exponentiating Equa 



tion (|A7| gives the typical decaying-oscillating character. 
Finally, evaluating the second time derivative of the rel- 
ative entropy we obtain K = Ki + K2, with 

Xi = 2{T -i9.)G{T + in), (A9) 

K2 = (T -infG + G{T + inf. (AlO) 

Normal systems are those whose generators commute 
with their time reversal: 



WW = WW. 



(All) 



As already pointed out above, W and W always have the 
same spectrum, but they might not have the same eigen- 
vectors. Normal generators do. Let q^ be the complex 
conjugate eigenvectors of W wih respect to A^. Applying 
W to the eigenequation we obtain 



WWql = WWq 



k \k -k 

± - ^±q±' 



(A12) 



which implies that also Wq^ is an eigenvector of W, rel- 
ative to eigenvalue A^ . Then q^ must be an eigenvector 
of W. Now, since matrix WW must have positive spec- 
trum (being similar to HH^ , which is symmetric hence 
with real spectrum), then one necessarily has that 



ql 



(A13) 



since A^A^i are the only real products of eigenvalues. 

To resume, normal systems are such that the time re- 
versal has the same spectrum and eigenvectors as the 
original dynamics, but it inverts positive and negative 
frequency modes. Time reversal inverts the oscillatory 
character (much like for quantum mechanical systems), 
while damping occurs in the same way. 

By application of the spectral theorem to normal ma- 
trices [22], it can be proven that for normal systems the 
Fisher matrix G is diagonal, or in other words eigenmodes 
can be normalized, yielding G — I. Therefore we obtain 



K ^AT^ > 0. 

Hence normal systems satisfy convexity. 
Moreover, for normal systems we have 

K2 = 2(T2 - r!^). 



(AM) 



(A15) 



which is positive on its own if and only if relaxation 
times are smaller than the coresponding decay periods, 
i.e. 1/Tfc < w^ , Vfc. This property seems to be valid and 
has already been consjecured by Maes et al. Indeed, even 
for non-normal systems, contrarily to the real case, we re- 
port that as we performed an intensive numerical search 
for three-state generators, we were not able to find sys- 
tems that have a non-positive K2 ■ Hence convexity seems 
to be more robust for systems with complex spectrum. 



Appendix B: Defective three-state system 

The task is to express parameters a, /3 in terms of 
the transition rates of a generic nearly defective three- 
state generator Let us introduce the quadratic oriented- 
spanning-tree polynomial z [5] and the linear polynomial 
t, given by minus the trace of W , 

t = W21 + W31 +fi;i2 -I- W32 + u'i3 + ^"23, (Bla) 

Z = W12W13 + ^32^13 + W12W23 + ^21^13 + W21W23 

+ W3lU'23 + W31W12 -I- W2lW'i2 + W31W32. (Bib) 

The system has a degenerate spectrum when t^ — Az. We 
perturb to first order the eigenvalues near the degenerate 
spectrum, A± = ^ {^t± Vi^ — 4z) w —^z{l =F e), from 
which A — ^\pz. The steady state and the decay modes 
are given by 



1 



Wl2U'i3 -I- W32W13 



W12W23 
W31W23 



W2\W\Z -I- W2\W2-i 
^ W31W12 + W21W32 -I- ^31^32 



(B2) 



q± = 



(Wl3 -I- W23 + X±){W12 -I- W32 + A±) - W23li'32 
W23W3I + W2l{wi3 + W23 + A±) 
W32W21 + W3i{wi2 + W32 + A±) 



wherefrom one can read off values of x, y in terms of the 
transition rates. A more compact representation can be 
given as follows. Letting ei = (1,0,0), and wi be the 
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first column of the generator, we can express the decay 
modes as q± = p* — ei + z~^\±wi, which can be proven 
by pugging this expression into the eigenvector equation, 
and multiplying by Aip, 



We then obtain 



{W+t)w,+z{e,-p*), 
(B3) 
z and A+ + A_ = — t. The above 
expression can be shown to vanish by direct calculation. 



where we used A+A- 



X ^p* -ii-wi/^/z, y = wi/y/z, (B4) 



and similarly we can express parameters a, /3 in terms of 
the transition rates. 
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